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ABSTRACT 


It  is  often  necessary  to  estimate 
the  population  distribution  of  a  random 
variate  from  a  sample  of  observed 
values.  Standard  parametric  families 
may  not  provide  satisfactory  fit  to  the 
data.  A  polynomial  family  is  con¬ 
structed  by  assuming  that  the  distribu¬ 
tion  function  G  is  a  constrained  poly¬ 
nomial  of  the  cumulative  distribution  F 
of  a  convenient  parametric  family. 
Polynomial  families  offer  great  flexi¬ 
bility  in  data  fitting,  while  retaining 
the  important  feature  of  parametric 
families  that  information  in  the  data  is 
condensed  into  a  moderate  number  of 
values . 
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EXECUTIVE  SUMMARY 


The  Armed  Services  Vocational  Aptitude  Battery  consists  of 
multiple -choice  tests.  New  kinds  of  computerized  tests  are  being 
developed  and  evaluated.  Distributions  of  scores  on  these  new  tests  can 
be  very  different  from  those  of  scores  on  multiple-choice  tests.  The 
same  is  true  of  the  computerized  adaptive  version  of  the  ASVAB. 
Distributions  observed  in  raw  data  contain  sample  error.  Smoothing  of 
these  distributions  is  useful  in  reducing  the  errors  in  statistical 
analyses,  and  also  in  displaying  the  distributions. 

The  actual  score  distribution  may  not  belong  to  any  of  the  familiar 
families  of  distributions.  In  such  a  case,  one  can  begin  with  a 
suitable  family  and  then  generalize  it.  In  the  generalized  family,  the 
cumulative  distribution  function  G  is  a  polynomial  of  F,  the 
distribution  function  of  the  original  family.  This  approach  can 
generate  distributions  with  a  wide  variety  of  shapes.  This  research 
contribution  presents  some  theory  of  such  general  families  of 
distributions . 
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INTRODUCTI{»J 


It  is  often  necessary  to  estimate  the  distribution  of  a  random 
variate  X  from  a  sample  of  observations.  Standard  parameteric  families 
may  not  provide  satisfactory  fit  to  the  data.  For  example,  the 
distribution  may  be  multimodal.  We  can  use  nonparametric  density 
estimation,  but  then  we  lose  the  convenience  of  summarizing  the 
information  in  the  data  in  a  moderate  number  of  values.  It  would  be 
useful  to  achieve  a  compromise  between  standard  parametric  families  and 
nonparametric  methods.  This  can  be  done  by  defining  a  family  in  which 
the  number  of  parameters  can  be  increased  indefinitely  until 
satisfactory  fit  to  the  data  is  obtained. 

POLYNOMIAL  FAMILIES 

Let  F(x,  6)  be  any  parametric  cumulative  distribution  function. 
(The  underscore  in  ^  indicates  that  it  is  a  vector.)  Let  G(x,  a) 
be  a  polynomial  of  F  with  the  form 

P 

G  =  F  +  Z  a^^g^^(F)  ,  (1) 

k=1 

where  function  is  a  polynomial  of  degree  k  +  1  and  contains  the 

factor  F( 1  -  F).  The  coefficients  a  are  such  that  G  is  monotone 
nondecreasing  in  (0,1).  The  factor  F( 1  -  F)  ensures  that  G  =  0  when 
F  =  0,  and  G  =  1  when  F  =  1.  Thus  G,  too,  is  a  cumulative 
distribution  function  and  hence  can  be  used  for  fitting  observed  data. 
The  functions  can  be  of  the  simple  form 

gj^(F)  -  F(1  -  F)  F^--^  ,  (2) 

but  then  the  polynomial  in  equation  1  contains  successive  terms  that  are 
strongly  correlated,  which  can  lead  to  ill-conditioned  matrices  and 
numerical  instabilities  while  estimating  the  coefficients.  Better 
expressions  for  these  functions  are  given  in  appendix  A.  The 
distribution  F,  which  is  a  special  case  of  G,  will  be  referred  to  as 
the  "base"  distribution.  The  expressions  are  a  compromise  between 
simplicity  and  spreading  out  the  zeroes  of  the  polynomials.  If  ninth 
degree  polynomials  fail  to  yield  a  good  fit,  one  should  probably  try  a 
different  base  distribution  or  try  transforming  the  data. 

In  principle,  the  base  distribution  can  have  any  form  with  any 
number  of  parameters.  In  practice,  its  choice  depends  on  ease  of 
computing  F  and  its  derivatives  and  on  its  suitability  for  the  data  in 
hand.  The  normal  distribution  is  a  natural  choice  if  x  can  take  any 
real  value.  If  x  can  take  only  positive  values,  the  Weibull  is  more 
convenient  than  the  gamma  distribution.  The  beta  distribution  is  a 
natural  (although  not  convenient)  choice  if  x  has  a  known  finite 
range.  The  variable  X  can  be  discrete  as  well  as  continuous.  Then, 
depending  on  the  nature  of  X,  the  distribution  may  be  binomial. 
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hypergeometric,  Poisson,  and  so  on.  Appendix  B  provides  formulas  for 
fitting  distributions  of  scores  on  multiple-choice  tests,  using  the 
negative  hyper geometric  as  the  base  distribution. 

In  data  fitting,  the  value  of  p  may  be  set  a  priori  or  determined 
in  a  stepwise  fashion.  In  the  latter  case,  we  begin  by  fitting  F. 

Then,  for  each  succesive  value  of  p,  we  reestimate  all  parameters 
(including  those  of  F)  and  decide  whether  addition  of  the  last  term 
provides  a  significant  improvement  in  fit  to  the  data. 

Polynomial  families  are  useful  because  they  are  extremely  flexible. 
Given  the  freedom  in  choosing  the  base  distribution  as  well  as  the 
degree  of  the  polynomial,  a  wide  variety  of  shapes  can  be  obtained.  In 
the  illustration  presented  later  in  this  report,  a  Weibull  base  and  only 
two  coefficients  in  the  polynomial  provide  excellent  fit  to  a  bimodal 
distribution . 

MAXIMUM  LIKELIHOOD  ESTIMATION 

Maximum  likelihood  estimation  (MLE)  has  optimal  asymptotic 
properties.  In  use  with  polynomial  families,  MLE  can  create  a  problem 
in  computation.  In  iterative  fitting,  coefficients  a  may  be  such  that 
the  polynomial  for  G  is  not  monotone  at  each  observed  value  of  x.  As 
a  result,  the  density  may  be  negative  or  zero;  hence,  its  logarithm  may 
not  exist  at  some  values  of  x.  The  algorithm  to  calculate  log  likeli¬ 
hood  must  check  for  this  possibility,  and  the  maximization  routine  must 
contain  steps  to  deal  with  the  problem  if  it  arises.  Appendix  A  gives 
equations  for  MLE  by  the  Newton-Raphson  procedure. 

MINIMUM  CHI-SQUARE  ESTIMATION 

Computations  are  substantially  simpler  if,  instead  of  maximum 
likelihood,  we  use  minimum  chi-square  with  the  objective  function 
defined  as  follows.  Let  N  be  the  sample  size,  m  <  N  +  1  a  positive 
integer,  ^(i)>  2,  ...,  N  the  order  statistics,  and 


0  .  Xq  < 


<  X 


2-  •  •  • 


<  X 


m- 1 


<  X  =  1 
m 


(3) 


Let  ■  [(f'l  +  ^^^h^  ^  ^  where  [y]  is  the  largest  integer  less  than 
y,  and  =  Nj^  -  The  X's  must  be  such  that  each  n^^  >  0.  By 

definition,  x^q^  and  x^^j^  are  smallest  and  largest  possible  values 
of  X  so  that  G(x^q^,  8,  a)  =  0  and  G(x^jji^,  0 ,  a)  =  1 .  The  quantity 
to  be  minimized  is 


m 

Q  =  z  [(N  +  1){G(x.  0,  a)  -  G(x, 

h=1  ^  h^  '  h-r 


i. 


a)}  - 


n.  ]^/n, 
h  h 


(4) 
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Although  there  is  no  rule  for  choosing  x's,  it  seems  desirable  to  space 
them  uniformly  so  that  Q  is  equally  sensitive  to  different  parts  of 
the  distribution. 

If  the  parameters  ^  and  a  are  known,  the  quantity  in  curly 
brackets  is  a  spacing  of  order  n^^,  with  expected  value  n^/(M  +  1). 

When  the  parameters  are  unknown,  we  can  estimate  them  by  minimizing  Q. 
Bofinger  [1]  has  shown  that,  in  the  asymptotic  limit,  when  N  -»■  “> 
while  m  and  the  X's  remain  fixed,  Q  has  a  chi-square  distribu¬ 
tion.  Therefore,  parameter  estimation  by  minimizing  Q  will  be  called 
the  "minimum  chi-square"  method. 

Unless  we  want  to  test  goodness  of  fit  between  G  and  the  data, 
the  distribution  of  Q  with  a  finite  sample  does  not  matter.  We  may 
even  take  m  =  N  +  1  and  use  spacings  of  order  one,  if  computational 
cost  is  not  a  concern.  Finite-sample  properties  of  the  estimator  have 
to  be  determined  by  Monte  Carlo  methods  and  are  beyond  the  scope  of  this 
paper.  The  important  practical  point  is  that  Q  can  be  computed  even 
if  G  is  not  monotone,  and  hence  no  special  precautions  are  needed  in 
the  calculations.  If  G  is  decreasing  in  some  interval  of  F  in 
(0,1),  that  merely  worsens  fit  to  the  data  and  increases  the  value  of 
Q.  Experience  has  shown,  however,  that  minimizing  Q  may  yield  small 
negative  slopes  at  end  points.  Therefore,  it  is  necessary  to  determine 
whether  derivatives  of  the  polynomial  in  equation  1  are  nonnegative  at 
F  =  0  and  F  =  1.  If  a  derivative  is  negative,  it  is  set  equal  to  zero 
by  changing  the  coefficients. 

Another  benefit  of  using  Q  is  the  following.  Q  is  a  quadratic 
function  of  the  coefficients  a.  If  parameters  0  of  F  are  held 
fixed,  minimization  over  coefficients  is  achieved  by  solving 
simultaneous  linear  equations.  This  method  is  a  major  simplification  of 
the  calculations.  In  addition,  constraining  the  derivatives  at  F  =  0 
and  F  =  1  is  much  easier  in  linear  equations  than  in  nonlinear 
fitting . 

Apart  from  computational  convenience,  Q  has  another  advantage 
over  MLE.  It  is  well  known  that  MLE  lacks  robustness  because  a  single 
extreme  value  can  dominate  the  likelihood  function  and  hence  the 
estimates  of  parameters.  In  contrast,  Q  uses  not  the  observed  values 
themselves  but  their  transforms  to  the  probability  metric.  The 
transformed  value  of  an  observation,  no  matter  how  extreme,  lies  between 
0  and  1  and  hence  cannot  dominate  the  objective  function. 

ILLUSTRATIOJ 

In  the  Infantry  phase  of  the  Marine  Corps  Job  Performance  Project, 
1,976  Marines  were  administered  a  video  firing  game  as  a  test  of 
eye-hand  coordination.  The  score  on  this  test  (rescaled  to  obtain  a 
mean  near  100)  was  fitted  with  a  Weibull  base  and  a  cubic  polynomial 
(p  =  2)  by  minimum  chi-square.  Despite  the  large  sample,  the  minimized 
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chi-square  with  45  degrees  of  freedom  was  only  38.7.  The  parameters  of 
the  Weibull  were  1.58  and  101.24;  the  polynomial  coefficients  were  0.48 
and  -1.90.  Figure  1  shows  a  histogram  and  the  fitted  distribution. 
Clearly,  an  excellent  fit  has  been  obtained  with  only  four  parameters. 

APPLICATIONS  OF  POLYNOMIAL  FAMILIES 

The  primary  use  of  polynomial  families,  as  illustrated  above,  is  to 
obtain  good  fit  to  the  sample  distribution  within  the  parametric 
framework.  A  stepwise  fit  would  be  used  in  most  applications.  If 
asymptotic  properties  of  maximum  likelihood  estimates  are  to  be  invoked, 
the  degree  of  the  polynomial  needs  to  be  specified  in  advance. 

Polynomial  families  are  also  useful  for  testing  goodness  of  fit. 
Tests  for  normality  are  based  on  skewness  and  kurtosis.  Corresponding 
tests  can  be  constructed  as  follows.  If  we  begin  with  a  normal  base  and 
then  add  only  the  quadratic  term  g^,  we  obtain  a  skewed  distribution. 

If  the  added  term  is  statistically  significant,  the  null  hypothesis  of 
normality  is  rejected  in  favor  of  a  skewed  distribution.  Suppose  we 
know  or  assume  that  the  true  distribution  is  symmetric.  Then  we  can  add 
only  the  symmetric  cubic  term  g.  (appendix  A)  and  test  whether  kurtosis 
is  same  as  that  of  the  normal,  we  can  test  skewness  and  kurtosis 
simultaneously  by  adding  g^  and  §2  together. 

These  tests  based  on  polynomial  families  have  two  major  advantages 
over  conventional  procedures.  First,  if  the  null  hypothesis  is 
rejected,  we  can  fit  an  alternative  distribution  that  fits  better  than 
the  normal  one  does.  Second,  the  procedure  is  completely  general:  it 
can  be  used  with  any  base  distribution  whatsoever  (e.g.,  logistic  or 
Cauchy  instead  of  normal).  If  the  likelihood  ratio  or  chi-square  test 
is  used,  the  asymptotic  distribution  of  the  test  statistic  is  the  same 
for  all  base  distributions.  (The  finite-sample  properties  of  the  test 
statistic  will  probably  depend  on  the  base  distribution.) 

Thus,  polynomial  families  provide  a  flexible  and  hence  powerful 
approach  to  fitting  and  testing  univariate  distributions,  discrete  as 
well  as  continuous. 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


REFERENCE 


[1]  Eve  Bofinger.  "Goodness-of-Fit  Test  Using  Sample  Quantiles." 
Journal  of  the  Royal  Statistical  Society,  Series  B  (1973): 
277-284 


-7- 


APPENDIX  A 


MATHEMATICAL  DETAILS  FOR  CONTINUOUS  VARIABLES 

g  FUNCTIONS  AND  THEIR  DERIVATIVES 

The  distribution  function  in  the  polynomial  family  is 

P 

G(x,  0,  a)  =  F(x,  0)  +  z  a(^g,^{F{x,  0)}  , 

k=1 

where  F  is  the  base  distribution  function  with  parameter  vector  0 

To  simplify  computations  as  well  as  formulas,  we  define 

h  =  2F  -  1  , 

H  =  F(1  -  F)  , 

T  =  (3  -  16H)  . 

Hence, 

h'  =  2  , 

H'  =  -h  , 

T'  =  16  h  , 

where  a  prime  denotes  derivative  with  respect  to  F. 


A-1 


The  functions  in  the  polynomial  and  their  derivatives  are 


6l 

=  H  , 

(A-la) 

sr 

=  -h  , 

(A-lb) 

gi" 

=  -2  , 

(A-lc) 

gr” 

-  0  ; 

(A-ld) 

g2 

=  F  H  , 

(A-2a) 

§2' 

=  F(2  -  3F)  , 

(A-2b) 

g2'  ' 

=  2  -  6F  , 

(A-2c) 

g2"  ’ 

=  -  6  ; 

(A-2d) 

S3 

=  h2  , 

(A-3a) 

S3’ 

=  -2  h  H  , 

(A-3b) 

g3' ' 

=  2  -  12H  , 

(A-3c) 

S3'” 

=  12  h  ; 

(A- 3d) 

A- 2 

g4  -  h  83  , 

(A-4a) 

84'  =  h  83' 

-  2  63  , 

(A-4b) 

84”  =  h  g3" 

+  4  gs'  , 

(A-4c) 

84"  '  -  h  03" 

■  +  6  g3"  ; 

(A-4d) 

85  =  T  g3  , 

(A-5a) 

85'  =Tg3. 

+  16  84  . 

(A-5b) 

85"  =  T  g3" 

+  32  (gij'  -  g3)  , 

(A-5c) 

85"’  =  T  g3" 

'  +  48  gi|'  '  -  96  §3*  ; 

(A-5d) 

Be  =  65  ’ 

(A-6a) 

Be'  =  2  g5  + 

h  85'  , 

(A-6b) 

86”  =  ^  85' 

+  h  gg"  , 

(A-6c) 

86'”  =  6  g5” 

-  h  g5'--  ; 

(A-6d) 

Sy  =  T  g^  , 

(A- 7a) 
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g?' 

=  T  g^'  +  16  gg  , 

(A- 7b) 

gy'  ' 

=  T  gg"  32  (gg- 

-  gg)  , 

(A-7e) 

gy"  ' 

=  T  gg"  '  +48  gg'  ’ 

-  96  gg'  ; 

(A-7d) 

gg  =  h  g.^  ,  (A-8a) 

gg’  =  2  gy  +  h  g^'  ,  (A-8b) 

gg'  '  =  ^  gj'  +  h  gj"  ,  (A-8e) 

§8'  '  '  =  ^  87”  ^  Sj'  "  •  {A-8d) 


Apart  from  g^,  forms  of  these  functions  are  not  unique  or  even 
optimal  in  any  sense.  They  do  have  a  convenient  feature:  g^'  equals  1 
at  F  =  0  and  -1  at  F  =  1,  and  g2'  equals  -1  at  F  =  1.  All  other 
derivatives  vanish  at  the  end  points,  which  makes  it  easy  to  impose 
monotonicity  at  the  end  points,  where  the  constraint  requires  that 
derivatives  not  be  negative.  The  corresponding  conditions  on  the 
coefficients,  for  F  =  0  and  F  =  1,  are 


and 


1  +  a^  >  0,  i.e. ,  a^  >  -1 


1  —  ( a  1  +  a^)  —  0,  i.e.  ,  a*|  a^  ^  1 
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Let  f(x)  be  the  density  dF/dx.  The  fitted  density  is 


dG/dx  =  f  (  1  +  Z  a,  g  (F)  ]  , 

k=1 


(A-9) 


where  prime  indicates  a  derivative  with  respect  to  F. 


FUNCTICWS  FOR  SYMMETRIC  DISTRIBUTIONS 


Sometimes  it  is  known  (or  assumed)  that  the  underlying  distribution 
is  symmetric.  Let  the  expression  for  G  be 


G  =  F  +  Z  ij.  (F) 
k  =  1 


(A-10) 


where  is  of  degree  2k  +  1 .  To  ensure  that  G  is  a  symmetric 

distribution  for  all  values  of  the  coefficients,  the  base  distribution 
must  be  symmetric  and  each  function  g  in  the  polynomial  must  be  an  odd 
function  of  (F  -  1/2).  Let  M  be  the  median  of  F. 


g^^{F(M)}  .  4^/2)  =  0 


for  all  k  and  hence 


G(M)  =  F(M)  +0=1/2 
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Thus,  the  density  dG/dx  is  an  even  function  of  (x  -  M),  i.e., 
distribution  G  is  symmetric  about  its  median  M. 


A  convenient  choice  of  polynomials  is  as  follows. 


=  h  H  ,  (A-1 la) 

g^'  =  6H  -  1  ,  (A-llb) 

=  -6h  ,  (A-l 1c) 

"  '  =  -12  ;  (A-lld) 

g2  =  ii  .  (A-12a) 

g^'  =  H  (10  H  -  2)  ,  (A-12b) 

2h  -  20  ,  (A-12c) 

'  =4-20  i/  ;  (A-12d) 

83  =  T  g2  .  (A-1 3a) 

83  =  T  §2  +  ^6  h  ,  (A-13b) 

83  =  T  §2  +  32(h  gg  +  §2)  ,  (A-13c) 
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I  t  t 


t  I  I 


(A-13d) 


S.2  +  48  h  +  96 


gi,  =  T  , 


r  t 


gj4  ^  §3  +  h 


t  I  It 


?4  =  T  g^  +  32(h  g^  +  g^) 


I  t  I  ^  ^  ^ 

T  g^  +  48  h  g„  +96  g. 


g4  =  i  g. 


(A-I4a) 

(A-I4b) 

(A-I4c) 

(A-I4d) 


Although  written  in  a  different  form,  functions  g^,  g^j  and  gj^ 
are  the  same  as  gi|,  g^ ,  and  gg,  respectively.  The  nonnegative  slope 
of  the  polynomial  at  F  =  0  and  F  =  1  requires  that 


MAXIMUM  LIKELIHOOD  ESTIMATION 

Let  f(x,  e)  be  the  density  function  of  the  base  distribution  and 
S,(x,  6i)  its  natural  logarithm.  The  derivative  of  the  fitted 
distribution  function  G  with  respect  to  F  is 

G'(x,  0,  a)  =  1  +  I  6)1  •  (A-15) 

k 
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Unless  otherwise  stated,  sums  over  k  are  from  1  to  p  and  sums  over  i 
are  from  1  to  n.  Hence  the  log  likelihood  of  a  sample,  containing 
values  ,  i  =  1,  2,  n  is 

LL  -  i  iL(x^,  e)  +  I  log[G'(xp  0,  a)]  .  (A-l6) 

i  i 

LL  is  the  function  to  be  maximized.  The  Newton-Raphson  method  requires 
first  and  second  derivatives  of  LL.  Therefore,  in  estimation  by 
maximum  likelihood,  we  need  derivatives  of  gj^  with  respect  to  F  up 
to  the  third  order,  and  derivatives  of  F  and  St,  with  respect  to  their 
parameters  up  to  the  second  order. 

Let  0^,  denote  a  parameter  of  F.  For  example,  if  F  is  a  normal 

distribution,  0^  is  the  mean  and  0^  the  standard  deviation. 

Subscripts  r  and  s  will  be  used  with  0,  and  subscripts  j,  k,  and 

1  for  coefficients  and  g  functions  in  the  polynomial.  Subscript  i 

will  indicate  an  observation.  Arguments  x^,  £,  and  a  will  be 

2 

suppressed.  3F/30  and  8  F/80  86  will  be  abbreviated  as  8  F 

and  8  a  F  and  the  derivatives  of  JL  will  be  treated  similarly.  The 
r  s 

derivative 

1  •  1 

8G  /86  =  (Z  a,  g,  )  8F/86 

r  ,  k'^k  r 

k 

I 

will  be  denoted  by  8^G 
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The  first  derivatives  of  log  likelihood  are 


\  1 

aLL/ae  -  £  [a  a  +  a  g  /G  ] 
r  .  r  r 
1 


(A-17) 


and 


I  I 

aLL/aa  .  £  g  ./G 

J  ,•  J 


(A-18) 


The  second  derivatives  are 


a  LL/36  30  =  £  a  a  a  +  (£  a,  g,  )  a  a  F/G 

r  s  .  r  s  ,  k'^k  r  s 

1  k 


-  a^G’  a^G’/G*^  +  a^F  a^F  (£  a^g^'”)  g’]  ,  (A-19) 

k 


3  LL/  3cd  ,  33 , 
J  1 


^  g,g;,/G 

i 


(A-20) 


and 


aLL/ae^,  aa^  =  £  [G'gj'  -  gj  £  a^g^']  a/zc'^ 

1.  k 


{A-21) 


Maximizing  likelihood  using  these  equations  yields  simultaneous 
estimates  of  0  and  the  coefficients. 
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Derivatives  for  Normal  Base 


Parameters  of  the  normal  distribution  are  mean  0^  and  standard 
deviation  02.  The  standard  normal  variate  is 

z  -  (x  -  0.|)/02  . 


The  cdf  depends  on  x  only  through  z 

F(x)  =  $(z) 

where  4  is  the  standard  normal  cdf. 
respect  to  parameters  are 

3^z  =  -1/02 
a^z  =  -z/02 

3^3.,z  =  0  , 

2 

82822  =  2z/02 

and 

328.jZ  =  1/02 


Partial  derivatives  of  z  with 

,  (A-22a) 

,  (A-22b) 

(A-22c) 

,  (A-22d) 

(A-22e) 
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Derivatives  of  $  with  respect  to  z  are 


$  =  41  =  exp(-z  /2)//2ir  , 


which  is  the  standard  normal  density,  and 


I  I  I 

4>  r  41  r  -Zip 


Using  these  derivatives  of  z  and  of  '!>,  those  of  F(x) 
computed  as  follows: 

a^F  =  4)a^z  =  -  p/e^  , 

a^F  =  pd^z  =  -z4>/02  ) 

2 

a^a^F  =  -z4>/62  > 

'22  2 
^2^2^  =  4>  z  /02  -*■  2zp/e^  , 

=  zp(2  -  z^)  , 


can  be 


(A-23a) 


(A-23b) 


(A-23c) 


(A-23d) 


and 


a^a^F  =  4>(1 


2n  ,r2 

z  )/02 


(A-23e) 
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The  density  of  x  is 


f(x,  e)  =  iti(z)/e2 

and  hence  the  log  density  is 

a(x,  -  z^/2  -  logCe^)  -  [log(2iT)]/2  . 

Therefore,  its  partial  derivatives  are 

a^s,  =  -z  a^z  =  z/e^ 

=  -z  -  1/62 
=  -  1/02  , 

2 

a^a^a  =  {a^z)/e2  =  -1/02  , 

0232*-  =  2z  022/0^  -  z^/02  +  1/0^ 

2  2  2 
=  -3z^/02  +  1/02  , 

2 

a23^S!.  =  d^z/Q^  -  z/02 

=  -2z/02  . 


{A-24a) 


(A-24b) 


(A-24c) 


(A-24d) 


(A-24e) 
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Derivatives  for  Weibull  Base 


The  Weibull  cdf  is 


where 


F  (x,  e)  =  1  -  exp(-z) 


(A-25a) 


z  =  (x/e^) 


(A-25b) 


Thus,  0^  and  shape  and  scale  parameters.  Derivatives  of  z 

are 


a^z  =  z  log(x/e2)  , 

a^z  =  -e^z/e^  , 
a^a^z  =  iog(x/e2)  a^z  , 

2 

^2^2^  =  -0.|  322/02  *  ^/®2 

=  -(0.,  +  1)  a2Z/02  , 


^2^1^  =  822  log(x/02)  -z/02 

=  {log(x/02)  +  1/0^}a2Z 


(A-26a) 


(A-26b) 


(A-26c) 


(A-26d) 


( A-26e) 
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Derivatives  of  F  with  respect  to  parameters  are 


3^F  n  (1  -  F)  a^z  , 
a2F  =  (1  -  F)  a^z  , 
a^a^F  =  ( 1  -  F)  a^z  [iog(x/e2)  -3^z]  , 

^2^2^  =  -(1  -  F)  [(0^  +  D/e^  +  022]  ^2^ 


and 


a23^F  =  (1  -  F)[1/8^  -  a^z  +  log(x/e2)]a2Z 


The  density  function  is 


01-1 

f(x,  0)  =  01  (x/e2)  exp(-z)/02  , 


and  its  log  is 


8,{x,  0)  =  log(0i)  +  (01  -  1)  log(x)  -  01  log(02)  -  z 


which  has  derivatives 


(A-27a) 

(A-27b) 

(A-27c) 

,  (A-27d) 

.(A-27e) 
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3^8,  =  1/6^  +  log(x/e2)  -  3^z  ,  (A-28a) 

=  "®/®2  "  ^2^  =  0^  (z  -  1)/02  ,  (A-28b) 

3^3^S,  =  -1/0^  -  3^3^z  ,  (A-28c) 

0202^  =  “  ^2^2^  ’  (A-28d) 


and 

323^8.  =  -1/02  -  a2a-|Z  •  (A-28e) 

Estimation  by  Minimum  Chi-Square 

The  quantity  Q  to  be  minimized,  defined  in  equation  4  in  the  main 
text,  is 


Q  - 


L  [(N  +  1){G(x^j^  ^  ,  0,  a)  -  G(x^jj  y  0,  a)}  - 


f 


which  simplifies  to 

Q  =  {N  +  1)^  z  i 

h  ^  h^ 

Sums  over  h  are  from  1  to 

Denote  F(x^l^  ^,0)  by 


,  i. 


m. 


and 


a)  -  G(X( 


G(X/  X,  0,  a) 


0,  a) ]^/n^  -  (N  +  1) 


by  G^ .  Then 
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Q  .  (N  +  1)^  z  [(F^  -  F^_^)  +  z  aj^{gj^(F^)  -  g;,(Fh_  1  ^  "  («  +  1)  .  (A-29) 

First,  let  us  minimize  Q  over  the  coefficients  a  while  the 
parameters  e  of  F  are  fixed: 

aQ/3a.  =  2  (»  *  I)^  r  ((F„  -  F^_,)  .  2  -  g^(F^_,))l 


Setting  these  derivatives  equal  to  zero  yields  simultaneous  linear 
equations  of  the  form 


a,  = 


c . 


J 


(A-30a) 


where 


c . 
J 


-I  (F, 


(A-30b) 


and 


E  tgj(Fj^)  -  gj(F^_^)}  {g]j(F^) 


(A-30c) 


To  obtain  derivatives  of  the  coefficients  with  respect  to  we 
differentiate  equation  A-30b  and  rearrange  terms  to  obtain 
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(A-31a) 


z  C.,  3  a, 


3  C  . 

r-  J 


J  \ 


and 


ZC.,  33  a,  =  33c.-Z3C.,  3  a, 

Jk  r  s  k  r  s  j  s  jk  r  k 

-  z  3  C  .,  3  a,  -  Z  3  3  C .,  a, 
rjksk  rsjkk 


{A-31b) 


Derivatives  needed  in  these  equations  are 


3  c.  =  -Z  [(3  F.  -  3  F,„  ,)  {g.(F^)  -  g.(F.  ,)} 

r  J  ,  r  h  r  h-r  h  h-1 


+  (Fu  -  F.  J  {3  F,  g.(F.  ) 
h  h-1  r  h  '  h 


3  F 


r‘h-1  ^j^^h-1'^^'"h  ’ 


(A-32a) 


3  3  c, 
r  s  j 


-£  [(3,,33F,  -  ((gj(F,,)  -  gj(F,,.,)) 

*  <»A  -  ‘Vh  -  “s^h-i  ^Fh-,)) 

®r^h-1 

I  t 


+  (3  F.  -  3  F,  ,)  {3  F.  g.(F.  ) 

s  h  s  h-1  r  h  h 


"  ("h  -  V;)  ^V^s^h  gj("h>  "  V^h  ®/h  gj  ^^h) 
"^r^s^h-1  ®J^^h-1^  ■  V^h-1  ^s\-1  ®j  ^^h-1^^^^"h 


(A- 32b) 


=  I  gj^^h^  -  V^h-1  gj(Vi»fgk(Ph>  -  gk(Vi^> 

"  ^gj(V  -  gj(^h-i)>^V"h  MV  -  ^Vi 


(A-33a) 


3  3  C.,  =  Z  [{3  3  F,  g.(F,  )  +  3  F.  3  F.  g .  (F,  ) 

r  s  jk  ^  r  s  h  h'  r  h  s  h  ^  h 


^r^s^h-1  ^j^^h-1^  ~  V^h-1  ^s^h-1  ^^h-1 ^®k^^h^  "  ®k  ^^h-1^^ 
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)} 


+  {3  F.  g.(F,  )  -  3  F  , 
s  h  °j  h  s  h-1  “j 


‘’A 


r‘h-1  ®k*’’h-l'’ 


-  (8j<t'h>-ej<''h-1>>  '“r^fh  “s^h  Sk'ft'h-l) 

-  V’s^'h-l  ek<fh-l>  -  ’/h-l  ^‘'h-1  ■  <*-33^) 


At  any  given  0,  one  computes  the  coefficients  in  the  polynomial  and  then 
the  sum  of  squares  Q.  Thus,  the  coefficients  and  Q  are  functions  of  9. 

The  derivatives  of  Q  are  given  by 


(l/2(N.l)2)a^Q  =  I(G^-  Gj,_,)[(3^F^  -  .  E 

-^r^h-l  e>h-l»  *  ^  ’A  -  e,(F,.,))]/n,  ,  (A-SHa) 

k 


[1/2(N  ,  1)  1  a^a^Q/  =  I  [(3^0^  -  a^G^.,)  (a^G^  -  a^G^.,) 

*  ■  °h-1>  <“r®s‘'h  ■  ®r®s'^h-1>I^"h 


+  i-  ( G,  —  G,  ,)[  2  a.  {3  3  F,  g.  (F,  ) 

.  h  h-1,  k  rsh'^k  h 

h  ,,  k 

+  3  F.  3  F.  g,  {F,  )  -  3  3  F,  ,  g,  (F.  , ) 

r  h  s  h  °k  h  r  s  h-1  “^k  h-1 

t 

—  3  F,  3  F.  ,  Si'  '  ( F,  a)}  +  2  3  a,  {3  F,  g,  ( F,  ) 
r  h-1  s  h-1  ®k  h-r  ,  r  k  s  h  “^k  h 

.  k 


ek<fh-i>'  "  J  “r“s‘‘k  ‘ek<''h>  -  8k<''h-i)’ 
,  k  , 


"  "  ek<‘’h>  -  • 

k 


{A-34b) 


where  (3[-Gj^  -  3pG^_-])  is  the  quantity  in  square  brackets  in  equation 
A-34a,  and  is  defined  similarly.  These  derivatives 
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are  used  to  estimate  0  by  minimizing  Q  with  the  Newton-Raphson 
procedure. 

Constraint  at  F  =  0 

The  derivative  of  G  with  respect  to  F  must  be  nonnegative  at 

F  =  0.  This  requires  that  >  -1.  If  we  obtain  a-]  <  -1  after 

solving  the  linear  equations,  the  value  obtained  is  replaced  by  -1.  If 

p  =  1,  (i.e.,  if  no  terms  higher  than  the  quadratic  are  present),  we  set 

3a,  =  a  3a,  =0  for  all  r  and  s  and  then  proceed  to  calculate 
r  1  r  s  1 

Q  and  its  derivatives. 

If  p  >  1,  the  polynomial  is  reexpressed  in  the  form 

P-1 

G  =  F  -  g^(F)  +  2  ’  (A-35a) 


where 


e^(F) 


Equations  for  the  coefficients  b  are  of  the  form 


P-1 

2 

k=1 


^Jk\ 


d  . 
J 


9 


(A- 35b) 


( A-36a) 


where 
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(A-36b) 


n  -  p 

""jk  -  "'j+i,  k+i 


and 


d  . 
J 


(A-36c) 


The  derivatives  of  D  and  d,  and  hence  those  of  b,  can  be  computed  from 

those  of  C  and  c.  Then  we  have  a-,  =  -1,  aa,  =  a  a  a,  =  0  and,  for 

I  ’  r  1  r  s  1  ’ 

J>1  a.^b.  T,  aa.  =  ab,  ,,  and  a  a  a .  =  a  a  b .  . . 

-J  J  J-1  ’  r  j  r  j-1  ’  r  s  j  r  s  j-1 


For  symmetric  distributions,  the  constraint  is  a^  <  1  , 
Equations  for  imposing  this  constraint  are  similar  to  those  above  for 
the  general  case,  with  a^  =  +1  and  -g^  replaced  by  +g^  in 
equation  A-35a,  and  a  negative  sign  for  in  equation  A-36c. 

Because  of  symmetry,  derivatives  at  F  =  0  and  F  =  1  are  equal  and 
hence  the  derivative  at  F  =  1  need  not  be  checked  separately. 

Constraint  at  F  =  1 

The  derivative  of  G  with  respect  to  F  at  F  =  1  must  be 
nonnegative,  which  requires  a^  +  a^  ^  1.  If  this  condition  is 
violated,  we  write 
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G  =  F  »  6,  -  '‘2(82  -  81'  »  ,  VA 

k=3 


p-1 

=  F  +  g  +  z  be  , 
k=1  ^ 


(A-37a) 


where 


e-j  -  g2  -  §1 


(A-37b) 


\  =  ^k.l  ^  ' 


(A-37C) 


and 


>=k  =  ■ 


(A-37d) 


Equations  for  b^^  have  the  form  in  equation  A-36a  with 


D =  C .  ,  ,  .  if  j,  k  >  1  , 

jk  j  +  1,k+1  -J’ 


(A -38a) 


°1k  “  S,k+1  "  '^l,k+1  if  ’ 


(A-38b) 


’^11  ■  ^22  ^11  ^^12  ’ 


(A-38c) 


'^k  ^  ‘^k+l  “  ^1,k+1  if  >  ■'  » 


(A-38d) 


and 
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(A-38e) 


di  -  -  (C^2  • 


After  coefficients  b  and  their  derivatives  have  been  computed,  the 
original  coefficients  are  obtained  as 


=  1  -  , 


(A-39a) 


“^k-l  ^  >  "I 


(A-39b) 


and  similarly  for  their  derivatives. 


Constraints  at  F  =  0  and  F  =  1 


If  a  zero  slope  has  to  be  imposed  at  both  end  points,  we  write 


G  =  F  -  g,  .  agj  .  , 


(A-'lOa) 


where 


■  ^k+2  ' 


(A-40b) 


The  coefficients  are  obtained  by  solving 


P-2 

r  D  b,  =  d  . 

k=1  ^  J 


(A-4la) 
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APPENDIX  B 


MATHEMATICAL  DETAILS  FOR  DISTRIBUTION  OF  TEST  SCORES 


INTRODUCTION 


Consider  a  test  containing  n  items.  The  test  score  x  is  the 
number  of  items  answered  correctly,  so  that  0  <  x  <  n.  A  convenient 
base  distribution  for  test  scores  is  the  beta  binomial  distribution 
generated  as  follows.  Let  T  have  a  beta  distribution  with  parameters 
a  and  b' .  Conditional  on  T  =  t,  let  the  distribution  of  x  be 
binomial  with  mean  nt.  Then,  integration  over  t  shows  that  the 
marginal  probability  of  score  x  is 

f(x)  =  [i'(n  +  1)/r(x  +  1)r(n  -  x  +  1)]  [r(x  +  a)r(n  -  x  +  b')/r(n  +  a  +  b')] 
[r(a  +  b' )/r(a)r(b' )]  j 

This  is  a  special  case  of  the  hypergeometric  distribution,  called  the 
negative  hypergeometric. 

Following  Lord  and  Novick  [B-1,  section  23.6],  it  is  convenient  to 
replace  parameter  b'  with 


b  =  b’  +  n  -  1 


B-1 


The  parameters  of  the  distribution  can  be  calculated  from  the  mean  y 
and  standard  deviation  o  as  follows: 


a  =  y(-1  +  1/a) 


(B-2a) 


and 


b  =  -a  -  1  +  n/a 


(B-2b) 


where 


a  =  [n/(n  -  1)]  [1  -  y{n  -  y)/no^] 


(B-2c) 


Estimates  of  a  and  b  can  be  obtained  by  replacing  y  and  o  with 
the  sample  mean  and  standard  deviation. 


CALCULATION  OF  PROBABILITIES 

The  score  probabilities  can  be  calculated  by  recursion.  On 
rewriting  equation  23.6.4  in  [B-1],  the  ratio  of  the  probabilities  of 
successive  scores  is  given  by 

f(x  +  1)/f(x)  =  (n  -  x)  (a  +  x)/(x  +  1)  (b  -  x),  x  <  n  -  1  .  (B-3) 


B-2 


Denote  this  ratio  by  w(x).  Let  u  be  some  score  in  the  middle  of  the 
distribution,  say  the  largest  integer  smaller  than  the  mean.  Let 
v(u)  =1. 


v(x  +  1)  =  w(x)v(x)  ,  u  <  X  <  n  -  1 


{B-4a) 


and 


v(x)  =  v{x  +  1)/w(x)  ,  0  <  X  <  u 


Then  the  score  probabilities  are  given  by 


n 

f(x)  =  v{x)/  L  v{y) 

y=o 


and  then  the  cumulative  probabilities  by 


X 

F(x)  =  z  f(y)  . 
y=o 


DERIVATIVES  OF  PROBABILITIES 


(B-4b) 


(B-5a) 


(B-Sb) 


The  parameters  of  the  distribution  are  9^  =  a  and  =  b. 
Therefore,  using  the  same  abbreviations  for  derivatives  as  in 
appendix  A, 


a^w(x)  =  w(x)/(a  +  x)  , 


(B-6a) 


B-3 


32W(x)  =  -w(x)/(b  -  x)  , 


(B-6b) 


3^a^w(x)  =  0  , 

(B-6c) 

3232w(x)  =  2  w(x)/(b  -  x)^  , 

(B-6d) 

and 

a^B^wCx)  =  -w(x)/(a  +  x)  (b  -  x) 

(B-6e) 

All  derivatives  of 

v(u)  vanish.  For  u  <  x  <  n,  we  have 

3^  v(x  +  1)  =  w(x)  3^v(x)  +  v(x)  3^w(x) 

(B-7a) 

and 

3  3  v(x  +  1 )  =  w(x) 
r  s 

3  3  v(x)  +  3  w(x)3  v(x)  +  3^w{x)3  v(x) 

X  o  1  O  O  I 

+  v(x)  3  3  w(x)  , 
r  s  ' 

(B-7b) 

where  each  subscript  can  be  1  or  2.  By  rearranging  equations  B-7a  and  B-7b, 
corresponding  equations  useful  at  x  <  u  are  found  to  be 


a^v(x)  =  [3^v(x  +  1)  -  v(x)a^w(x) ]/w(x)  ,  (B-8a) 


and 


B-4 


(B-8b) 


a  a  v(x)  =  [a  a  v(x  +  1)  -  a  w(x)a  v(x) 

1  o  r  o  r  o 

-  a  w(x)a  v(x)  -  v(x)a  a  w(x)]/w(x)  . 

SI/  1/0 

where  derivatives  of  w(x)  are  obtained  from  equations  B-6a  to  B-6e. 


To  obtain  derivatives  of  score  probabilities,  define  the  sum 


n 

S  =  Z  v(x)  , 
x=0 


and  use  equation  B-5a  to  obtain 


a^f(x)  =  [a^v(x)  -  f(x)a^S]/S  (B-9a) 


and 


a^agf(x)  =  [a^a^v(x)  -  f(x)(a^agS)  -  agf{x)(a^S) 

-  a^f(x)(a^s)]/s 


{B-9b) 


ESTIMATICW 


The  fitted  cdf  is 


G(x,  0,  a) 


F(x)  +  2  a^^g^  [F(x,  6)] 

k 


(B-10) 


B-5 


Hence  the  fitted  score  probabilities  are  given  by 


pr(0,  e,  a)  =  F(0,  e)  +  i:  a.g.  [F(0,  0)]  (B-lla) 

K  K 

and,  for  x  >  0, 

pr{x,  6,  a)  =  f(x,  0)  +  l  aj^  [gj^{F(x,  0)} 

k 

-  g^{F(x  -  1,  0)}]  (B-llb) 

These  probabilities  can  be  used  for  maximum  likelihood  estimation,  but 
minimum  chi-square  is  more  convenient  while  being  asymptotically 
equivalent  to  maximum  likelihood. 

For  minimum  chi-square  estimation,  create  m  <  n  +  1  cells  or 
score  groups  by  choosing  scores  0  <  <  y2. .  .<  y^^^  =  n  so  that  the 
observed  frequency  in  each  score  group  exceeds  some  value  (say  10). 

Let  P^yh»  ^h  ■  ^^^h’  yo  "  ^0  ~  ^ 

by  definition,  and  P'm  =  n^^  be  the  observed  frequency  in 

cell  h  which  contains  scores  x  given  by  yj^_^  +  ”1  1  1  y^,  so  that 

z  n^  =  W,  the  sample  size.  The  quantity  to  be  minimized  is 
h 


Q  =  I  [N  (G 


h 


G  , )  -  n,  ]  /n 
h-1  h  h 


(B-12) 


B-6 


Expressions  for  derivatives  of  Q  are  the  same  as  in  appendix  A, 
that  (N  +  1)  (which  in  appendix  A  is  the  total  number  of  gaps)  is 
replaced  by  the  sample  size  N. 


except 


B-7 
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